DEPR: deprecate SparseArray.values #26421

jorisvandenbossche · 2019-05-16T08:10:28Z

Having a .values attribute on SparseArray is confusing, as .values is typically used on Series/DataFrame/Index and not on the array classes.

codecov · 2019-05-16T08:50:42Z

Codecov Report

Merging #26421 into master will decrease coverage by <.01%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master   #26421      +/-   ##
==========================================
- Coverage   91.69%   91.68%   -0.01%     
==========================================
  Files         174      174              
  Lines       50741    50743       +2     
==========================================
- Hits        46529    46526       -3     
- Misses       4212     4217       +5

Flag	Coverage Δ
#multiple	`90.19% <66.66%> (ø)`	⬆️
#single	`41.16% <0%> (-0.18%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/sparse/frame.py	`95.63% <100%> (ø)`	⬆️
pandas/core/ops.py	`94.68% <100%> (ø)`	⬆️
pandas/util/testing.py	`90.6% <100%> (-0.11%)`	⬇️
pandas/core/arrays/sparse.py	`92.71% <40%> (+0.01%)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97.02% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 421ae9d...1865863. Read the comment docs.

codecov · 2019-05-16T08:50:50Z

Codecov Report

Merging #26421 into master will increase coverage by <.01%.
The diff coverage is 70%.

@@            Coverage Diff             @@
##           master   #26421      +/-   ##
==========================================
+ Coverage   91.74%   91.75%   +<.01%     
==========================================
  Files         174      174              
  Lines       50763    50754       -9     
==========================================
- Hits        46575    46567       -8     
+ Misses       4188     4187       -1

Flag	Coverage Δ
#multiple	`90.26% <70%> (ø)`	⬆️
#single	`41.71% <10%> (-0.08%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/internals/managers.py	`93.93% <ø> (ø)`	⬆️
pandas/core/sparse/frame.py	`95.63% <100%> (ø)`	⬆️
pandas/util/testing.py	`90.7% <100%> (+0.1%)`	⬆️
pandas/core/ops.py	`94.68% <100%> (ø)`	⬆️
pandas/core/arrays/sparse.py	`93.08% <50%> (+0.38%)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97.02% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44d5498...fb3aebe. Read the comment docs.

jreback · 2019-05-16T12:33:20Z

pandas/core/ops.py

@@ -2272,10 +2272,10 @@ def _cast_sparse_series_op(left, right, opname):
    # TODO: This should be moved to the array?
    if is_integer_dtype(left) and is_integer_dtype(right):
        # series coerces to float64 if result should have NaN/inf
-        if opname in ('floordiv', 'mod') and (right.values == 0).any():
+        if opname in ('floordiv', 'mod') and (right.to_dense() == 0).any():


should we not be using np.asarry? generally rather than .to_dense()?

Both are equivalent (although to_dense actually does a bit less as it specified the dtype and asarray does some inference (not sure for that difference though)).

jorisvandenbossche · 2019-05-16T20:05:53Z

cc @TomAugspurger since you are most familiar with Sparse nowadays .. (although reluctantly :-))

Removing this here also further entangles a bit the get_values / values mess, as SparseArray is still the only array with .values, and in some places we do hasattr or getattr on 'values', which then catches SparseArray ..

TomAugspurger · 2019-05-16T20:21:56Z

+1

Looks like a few warnings still https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=11542&view=logs&jobId=521b7dfd-2989-5ff8-bc8c-7481906480fa&taskId=07b8d9d4-6363-5e2d-bc2b-146a30521256&lineStart=154&lineEnd=154&colStart=109&colEnd=115

My other PR is adding

filterwarnings =
    error:Sparse:FutureWarning

to our setup.cfg. If you make the error message something like SparseArray.values, these warnings would be elevated to errors too (not sure if we want that or not).

jorisvandenbossche · 2019-05-17T12:25:34Z

Ah, I missed the apply ones.
It's quite annoying that the output on our CI does not show which tests is causing it ... (due to using xdist).

There is one (that I actually already knew about, but for now ignored) that is not that easy to solve: the json code (ujson/python/objToJSON.c) checks in C for a 'values' attribute to get the values out of dataframe / series / index etc.

jorisvandenbossche · 2019-05-20T11:51:41Z

@TomAugspurger @jreback can you have a new look? I added some extra compat code in cython/c code

TomAugspurger · 2019-05-20T13:13:30Z

pandas/_libs/reduction.pyx

@@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt):
    return np.empty(size, dtype='O')


+cdef bint _is_sparse_array(object obj):


Would this be better-suited for pandas._libs.util? Or keep here since this is the only file using it and it's temporary?

Yes, exactly for those reasons (It's only used here, and should be removed again once we get rid of this deprecation), I would keep it here (it's not mean to be a general utility)

this is not the right location not should be in util
your argument is not correct ; just because we eventually will remove it does not mean it should. it be with similar code

TomAugspurger · 2019-05-20T13:19:08Z

+1

…

On Mon, May 20, 2019 at 8:18 AM Joris Van den Bossche < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/_libs/reduction.pyx <#26421 (comment)>: > @@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt): return np.empty(size, dtype='O') +cdef bint _is_sparse_array(object obj): Yes, exactly for those reasons (It's only used here, and should be removed again once we get rid of this deprecation), I would keep it here — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26421?email_source=notifications&email_token=AAKAOIS6HRAYVGNAUBPWWMDPWKQJZA5CNFSM4HNKAIJ2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOBZDF2AQ#discussion_r285587187>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIXL3A3WPZHBHQW2H7TPWKQJZANCNFSM4HNKAIJQ> .

jreback

not really sure of the urgency here @jorisvandenbossche

i have some comments - and will fully review at some point

jreback · 2019-05-21T10:10:25Z

pandas/_libs/reduction.pyx

@@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt):
    return np.empty(size, dtype='O')


+cdef bint _is_sparse_array(object obj):


this is not the right location not should be in util
your argument is not correct ; just because we eventually will remove it does not mean it should. it be with similar code

jreback · 2019-05-21T10:10:59Z

pandas/_libs/reduction.pyx

@@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt):
    return np.empty(size, dtype='O')


+cdef bint _is_sparse_array(object obj):
+    # TODO can be removed one SparseArray.values is removed (GH26421)
+    if hasattr(obj, '_subtyp'):


this idiom should be getattr

jorisvandenbossche · 2019-05-21T10:16:34Z

Sorry, there was no urgency at all. Just thought for a moment that the review of Tom was enough, and wanted to get over with this PR. Will wait on your full review then before doing any fixup.

jreback · 2019-05-26T15:05:09Z

@jorisvandenbossche my main comment was the Is_sparse_array needs to be in util.pyx (doesn't matter that we will eventually remove it), its in the wrong place.

slight confusion between whether we recommend .to_dense() or np.array() for conversions; we should try to be consistent (maybe just deprecate .to_dense()) but another issue (maybe create one).

DEPR: deprecate SparseArray.values

1865863

jorisvandenbossche added the Deprecate Functionality to remove in pandas label May 16, 2019

jorisvandenbossche added this to the 0.25.0 milestone May 16, 2019

jorisvandenbossche mentioned this pull request May 16, 2019

Deprecate SparseDataFrame and SparseSeries #26137

Merged

jorisvandenbossche added 3 commits May 16, 2019 14:05

fix extension array .where case

1018049

Merge remote-tracking branch 'upstream/master' into depr-sparse-values

781fcd0

add PR number to whatsnew

979a3fe

jreback reviewed May 16, 2019

View reviewed changes

fix warning in lib.reduction

e3737c0

jorisvandenbossche added 5 commits May 17, 2019 14:31

fix json warning

09d5122

Merge remote-tracking branch 'upstream/master' into depr-sparse-values

baeda5f

linting

9f61b73

add filterwarning

1f9e5fb

trigger azure

fb3aebe

TomAugspurger approved these changes May 20, 2019

View reviewed changes

jorisvandenbossche merged commit d3a1912 into pandas-dev:master May 21, 2019

jorisvandenbossche deleted the depr-sparse-values branch May 21, 2019 06:23

jreback reviewed May 21, 2019

View reviewed changes

jorisvandenbossche mentioned this pull request Dec 4, 2019

DEPR: remove get_values, SparseArray.values #29989

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: deprecate SparseArray.values #26421

DEPR: deprecate SparseArray.values #26421

jorisvandenbossche commented May 16, 2019

codecov bot commented May 16, 2019

codecov bot commented May 16, 2019 •

edited

Loading

jreback May 16, 2019

jorisvandenbossche May 16, 2019

jorisvandenbossche commented May 16, 2019

TomAugspurger commented May 16, 2019

jorisvandenbossche commented May 17, 2019

jorisvandenbossche commented May 20, 2019

TomAugspurger May 20, 2019

jorisvandenbossche May 20, 2019 •

edited

Loading

jreback May 21, 2019

TomAugspurger commented May 20, 2019 via email

jreback left a comment

jreback May 21, 2019

jreback May 21, 2019 •

edited

Loading

jorisvandenbossche commented May 21, 2019 •

edited

Loading

jreback commented May 26, 2019

		@@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt):
		return np.empty(size, dtype='O')


		cdef bint _is_sparse_array(object obj):

DEPR: deprecate SparseArray.values #26421

DEPR: deprecate SparseArray.values #26421

Conversation

jorisvandenbossche commented May 16, 2019

codecov bot commented May 16, 2019

Codecov Report

codecov bot commented May 16, 2019 • edited Loading

Codecov Report

jreback May 16, 2019

Choose a reason for hiding this comment

jorisvandenbossche May 16, 2019

Choose a reason for hiding this comment

jorisvandenbossche commented May 16, 2019

TomAugspurger commented May 16, 2019

jorisvandenbossche commented May 17, 2019

jorisvandenbossche commented May 20, 2019

TomAugspurger May 20, 2019

Choose a reason for hiding this comment

jorisvandenbossche May 20, 2019 • edited Loading

Choose a reason for hiding this comment

jreback May 21, 2019

Choose a reason for hiding this comment

TomAugspurger commented May 20, 2019 via email

jreback left a comment

Choose a reason for hiding this comment

jreback May 21, 2019

Choose a reason for hiding this comment

jreback May 21, 2019 • edited Loading

Choose a reason for hiding this comment

jorisvandenbossche commented May 21, 2019 • edited Loading

jreback commented May 26, 2019

codecov bot commented May 16, 2019 •

edited

Loading

jorisvandenbossche May 20, 2019 •

edited

Loading

jreback May 21, 2019 •

edited

Loading

jorisvandenbossche commented May 21, 2019 •

edited

Loading